102 research outputs found

    Accelerating Deterministic and Stochastic Binarized Neural Networks on FPGAs Using OpenCL

    Full text link
    Recent technological advances have proliferated the available computing power, memory, and speed of modern Central Processing Units (CPUs), Graphics Processing Units (GPUs), and Field Programmable Gate Arrays (FPGAs). Consequently, the performance and complexity of Artificial Neural Networks (ANNs) is burgeoning. While GPU accelerated Deep Neural Networks (DNNs) currently offer state-of-the-art performance, they consume large amounts of power. Training such networks on CPUs is inefficient, as data throughput and parallel computation is limited. FPGAs are considered a suitable candidate for performance critical, low power systems, e.g. the Internet of Things (IOT) edge devices. Using the Xilinx SDAccel or Intel FPGA SDK for OpenCL development environment, networks described using the high-level OpenCL framework can be accelerated on heterogeneous platforms. Moreover, the resource utilization and power consumption of DNNs can be further enhanced by utilizing regularization techniques that binarize network weights. In this paper, we introduce, to the best of our knowledge, the first FPGA-accelerated stochastically binarized DNN implementations, and compare them to implementations accelerated using both GPUs and FPGAs. Our developed networks are trained and benchmarked using the popular MNIST and CIFAR-10 datasets, and achieve near state-of-the-art performance, while offering a >16-fold improvement in power consumption, compared to conventional GPU-accelerated networks. Both our FPGA-accelerated determinsitic and stochastic BNNs reduce inference times on MNIST and CIFAR-10 by >9.89x and >9.91x, respectively.Comment: 4 pages, 3 figures, 1 tabl

    Automated machine learning for healthcare and clinical notes analysis

    Get PDF
    Machine learning (ML) has been slowly entering every aspect of our lives and its positive impact has been astonishing. To accelerate embedding ML in more applications and incorporating it in real-world scenarios, automated machine learning (AutoML) is emerging. The main purpose of AutoML is to provide seamless integration of ML in various industries, which will facilitate better outcomes in everyday tasks. In healthcare, AutoML has been already applied to easier settings with structured data such as tabular lab data. However, there is still a need for applying AutoML for interpreting medical text, which is being generated at a tremendous rate. For this to happen, a promising method is AutoML for clinical notes analysis, which is an unexplored research area representing a gap in ML research. The main objective of this paper is to fill this gap and provide a comprehensive survey and analytical study towards AutoML for clinical notes. To that end, we first introduce the AutoML technology and review its various tools and techniques. We then survey the literature of AutoML in the healthcare industry and discuss the developments specific to clinical settings, as well as those using general AutoML tools for healthcare applications. With this background, we then discuss challenges of working with clinical notes and highlight the benefits of developing AutoML for medical notes processing. Next, we survey relevant ML research for clinical notes and analyze the literature and the field of AutoML in the healthcare industry. Furthermore, we propose future research directions and shed light on the challenges and opportunities this emerging field holds. With this, we aim to assist the community with the implementation of an AutoML platform for medical notes, which if realized can revolutionize patient outcomes

    Design and Implementation of BCM Rule Based on Spike-Timing Dependent Plasticity

    Full text link
    The Bienenstock-Cooper-Munro (BCM) and Spike Timing-Dependent Plasticity (STDP) rules are two experimentally verified form of synaptic plasticity where the alteration of synaptic weight depends upon the rate and the timing of pre- and post-synaptic firing of action potentials, respectively. Previous studies have reported that under specific conditions, i.e. when a random train of Poissonian distributed spikes are used as inputs, and weight changes occur according to STDP, it has been shown that the BCM rule is an emergent property. Here, the applied STDP rule can be either classical pair-based STDP rule, or the more powerful triplet-based STDP rule. In this paper, we demonstrate the use of two distinct VLSI circuit implementations of STDP to examine whether BCM learning is an emergent property of STDP. These circuits are stimulated with random Poissonian spike trains. The first circuit implements the classical pair-based STDP, while the second circuit realizes a previously described triplet-based STDP rule. These two circuits are simulated using 0.35 um CMOS standard model in HSpice simulator. Simulation results demonstrate that the proposed triplet-based STDP circuit significantly produces the threshold-based behaviour of the BCM. Also, the results testify to similar behaviour for the VLSI circuit for pair-based STDP in generating the BCM

    Efficient Design of Triplet Based Spike-Timing Dependent Plasticity

    Full text link
    Spike-Timing Dependent Plasticity (STDP) is believed to play an important role in learning and the formation of computational function in the brain. The classical model of STDP which considers the timing between pairs of pre-synaptic and post-synaptic spikes (p-STDP) is incapable of reproducing synaptic weight changes similar to those seen in biological experiments which investigate the effect of either higher order spike trains (e.g. triplet and quadruplet of spikes), or, simultaneous effect of the rate and timing of spike pairs on synaptic plasticity. In this paper, we firstly investigate synaptic weight changes using a p-STDP circuit and show how it fails to reproduce the mentioned complex biological experiments. We then present a new STDP VLSI circuit which acts based on the timing among triplets of spikes (t-STDP) that is able to reproduce all the mentioned experimental results. We believe that our new STDP VLSI circuit improves upon previous circuits, whose learning capacity exceeds current designs due to its capability of mimicking the outcomes of biological experiments more closely; thus plays a significant role in future VLSI implementation of neuromorphic systems

    Training Progressively Binarizing Deep Networks Using FPGAs

    Full text link
    While hardware implementations of inference routines for Binarized Neural Networks (BNNs) are plentiful, current realizations of efficient BNN hardware training accelerators, suitable for Internet of Things (IoT) edge devices, leave much to be desired. Conventional BNN hardware training accelerators perform forward and backward propagations with parameters adopting binary representations, and optimization using parameters adopting floating or fixed-point real-valued representations--requiring two distinct sets of network parameters. In this paper, we propose a hardware-friendly training method that, contrary to conventional methods, progressively binarizes a singular set of fixed-point network parameters, yielding notable reductions in power and resource utilizations. We use the Intel FPGA SDK for OpenCL development environment to train our progressively binarizing DNNs on an OpenVINO FPGA. We benchmark our training approach on both GPUs and FPGAs using CIFAR-10 and compare it to conventional BNNs.Comment: Accepted at 2020 IEEE International Symposium on Circuits and Systems (ISCAS

    Design and analysis of efficient QCA reversible adders

    Get PDF
    Quantum-dot cellular automata (QCA) as an emerging nanotechnology are envisioned to overcome the scaling and the heat dissipation issues of the current CMOS technology. In a QCA structure, information destruction plays an essential role in the overall heat dissipation, and in turn in the power consumption of the system. Therefore, reversible logic, which significantly controls the information flow of the system, is deemed suitable to achieve ultra-low-power structures. In order to benefit from the opportunities QCA and reversible logic provide, in this paper, we first review and implement prior reversible full-adder art in QCA. We then propose a novel reversible design based on three- and five-input majority gates, and a robust one-layer crossover scheme. The new full-adder significantly advances previous designs in terms of the optimization metrics, namely cell count, area, and delay. The proposed efficient full-adder is then used to design reversible ripple-carry adders (RCAs) with different sizes (i.e., 4, 8, and 16 bits). It is demonstrated that the new RCAs lead to 33% less garbage outputs, which can be essential in terms of lowering power consumption. This along with the achieved improvements in area, complexity, and delay introduces an ultra-efficient reversible QCA adder that can be beneficial in developing future computer arithmetic circuits and architecture

    Semi-supervised and weakly-supervised deep neural networks and dataset for fish detection in turbid underwater videos

    Get PDF
    Fish are key members of marine ecosystems, and they have a significant share in the healthy human diet. Besides, fish abundance is an excellent indicator of water quality, as they have adapted to various levels of oxygen, turbidity, nutrients, and pH. To detect various fish in underwater videos, Deep Neural Networks (DNNs) can be of great assistance. However, training DNNs is highly dependent on large, labeled datasets, while labeling fish in turbid underwater video frames is a laborious and time-consuming task, hindering the development of accurate and efficient models for fish detection. To address this problem, firstly, we have collected a dataset called FishInTurbidWater, which consists of a collection of video footage gathered from turbid waters, and quickly and weakly (i.e., giving higher priority to speed over accuracy) labeled them in a 4-times fast-forwarding software. Next, we designed and implemented a semi-supervised contrastive learning fish detection model that is self-supervised using unlabeled data, and then fine-tuned with a small fraction (20%) of our weakly labeled FishInTurbidWater data. At the next step, we trained, using our weakly labeled data, a novel weakly-supervised ensemble DNN with transfer learning from ImageNet. The results show that our semi-supervised contrastive model leads to more than 20 times faster turnaround time between dataset collection and result generation, with reasonably high accuracy (89%). At the same time, the proposed weakly-supervised ensemble model can detect fish in turbid waters with high (94%) accuracy, while still cutting the development time by a factor of four, compared to fully-supervised models trained on carefully labeled datasets. Our dataset and code are publicly available at the hyperlink FishInTurbidWater

    Variation-aware binarized memristive networks

    Get PDF
    The quantization of weights to binary states in Deep Neural Networks (DNNs) can replace resource-hungry multiply accumulate operations with simple accumulations. Such Binarized Neural Networks (BNNs) exhibit greatly reduced resource and power requirements. In addition, memristors have been shown as promising synaptic weight elements in DNNs. In this paper, we propose and simulate novel Binarized Memristive Convolutional Neural Network (BMCNN) architectures employing hybrid weight and parameter representations. We train the proposed architectures offline and then map the trained parameters to our binarized memristive devices for inference. To take into account the variations in memristive devices, and to study their effect on the performance, we introduce variations in R ON and R OFF . Moreover, we introduce means to mitigate the adverse effect of memristive variations in our proposed networks. Finally, we benchmark our BMCNNs and variation-aware BMCNNs using the MNIST dataset

    Ensemble Machine Learning Model Trained on a New Synthesized Dataset Generalizes Well for Stress Prediction Using Wearable Devices

    Full text link
    Introduction. We investigate the generalization ability of models built on datasets containing a small number of subjects, recorded in single study protocols. Next, we propose and evaluate methods combining these datasets into a single, large dataset. Finally, we propose and evaluate the use of ensemble techniques by combining gradient boosting with an artificial neural network to measure predictive power on new, unseen data. Methods. Sensor biomarker data from six public datasets were utilized in this study. To test model generalization, we developed a gradient boosting model trained on one dataset (SWELL), and tested its predictive power on two datasets previously used in other studies (WESAD, NEURO). Next, we merged four small datasets, i.e. (SWELL, NEURO, WESAD, UBFC-Phys), to provide a combined total of 99 subjects,. In addition, we utilized random sampling combined with another dataset (EXAM) to build a larger training dataset consisting of 200 synthesized subjects,. Finally, we developed an ensemble model that combines our gradient boosting model with an artificial neural network, and tested it on two additional, unseen publicly available stress datasets (WESAD and Toadstool). Results. Our method delivers a robust stress measurement system capable of achieving 85% predictive accuracy on new, unseen validation data, achieving a 25% performance improvement over single models trained on small datasets. Conclusion. Models trained on small, single study protocol datasets do not generalize well for use on new, unseen data and lack statistical power. Ma-chine learning models trained on a dataset containing a larger number of varied study subjects capture physiological variance better, resulting in more robust stress detection.Comment: 37 pages, 11 figure

    Internet of Underwater Things and Big Marine Data Analytics -- A Comprehensive Survey

    Full text link
    The Internet of Underwater Things (IoUT) is an emerging communication ecosystem developed for connecting underwater objects in maritime and underwater environments. The IoUT technology is intricately linked with intelligent boats and ships, smart shores and oceans, automatic marine transportations, positioning and navigation, underwater exploration, disaster prediction and prevention, as well as with intelligent monitoring and security. The IoUT has an influence at various scales ranging from a small scientific observatory, to a midsized harbor, and to covering global oceanic trade. The network architecture of IoUT is intrinsically heterogeneous and should be sufficiently resilient to operate in harsh environments. This creates major challenges in terms of underwater communications, whilst relying on limited energy resources. Additionally, the volume, velocity, and variety of data produced by sensors, hydrophones, and cameras in IoUT is enormous, giving rise to the concept of Big Marine Data (BMD), which has its own processing challenges. Hence, conventional data processing techniques will falter, and bespoke Machine Learning (ML) solutions have to be employed for automatically learning the specific BMD behavior and features facilitating knowledge extraction and decision support. The motivation of this paper is to comprehensively survey the IoUT, BMD, and their synthesis. It also aims for exploring the nexus of BMD with ML. We set out from underwater data collection and then discuss the family of IoUT data communication techniques with an emphasis on the state-of-the-art research challenges. We then review the suite of ML solutions suitable for BMD handling and analytics. We treat the subject deductively from an educational perspective, critically appraising the material surveyed.Comment: 54 pages, 11 figures, 19 tables, IEEE Communications Surveys & Tutorials, peer-reviewed academic journa
    • …
    corecore